Difference-of-Convex Learning: Directional Stationarity, Optimality, and Sparsity
نویسندگان
چکیده
This paper studies a fundamental bicriteria optimization problem for variable selection in statistical learning; the two criteria are a loss/residual function and a model control (also called regularization, penalty). The former function measures the fitness of the learning model to data and the latter function is employed as a control of the complexity of the model. We focus on the case where the loss function is (strongly) convex and the model control function is a differenceof-convex (dc) sparsity measure. Our paper establishes some fundamental optimality and sparsity properties of directional stationary solutions to a nonconvex Lagrangian formulation of the bicriteria optimization problem, based on a specially structured dc representation of many well-known sparsity functions that can be profitably exploited in the analysis. We relate the Lagrangian optimization problem with the penalty constrained problem in terms of their respective d(irectional)-stationary solutions; this is in contrast to common analysis that pertains to the (global) minimizers of the problem which are not computable due to nonconvexity. Most importantly, we provide sufficient conditions under which the d(irectional)-stationary solutions of the nonconvex Lagrangian formulation are global minimizers (possibly restricted due to nondifferentiability), thereby filling the gap between previous minimizer-based analysis and practical computational considerations. The established relation allows us to readily apply the derived results for the Lagrangian formulation to the penalty constrained formulation. Specializations of the conditions to exact and surrogate sparsity functions are discussed, yielding optimality and sparsity results for existing nonconvex formulations of the statistical learning problem.
منابع مشابه
Point Source Super-resolution Via Non-convex L1 Based Methods
We study the super-resolution (SR) problem of recovering point sources consisting of a collection of isolated and suitably separated spikes from only the low frequency measurements. If the peak separation is above a factor in (1, 2) of the Rayleigh length (physical resolution limit), L1 minimization is guaranteed to recover such sparse signals. However, below such critical length scale, especia...
متن کاملSparsity Constrained Nonlinear Optimization: Optimality Conditions and Algorithms
This paper treats the problem of minimizing a general continuously differentiable function subject to sparsity constraints. We present and analyze several different optimality criteria which are based on the notions of stationarity and coordinate-wise optimality. These conditions are then used to derive three numerical algorithms aimed at finding points satisfying the resulting optimality crite...
متن کاملOptimality and Duality for an Efficient Solution of Multiobjective Nonlinear Fractional Programming Problem Involving Semilocally Convex Functions
In this paper, the problem under consideration is multiobjective non-linear fractional programming problem involving semilocally convex and related functions. We have discussed the interrelation between the solution sets involving properly efficient solutions of multiobjective fractional programming and corresponding scalar fractional programming problem. Necessary and sufficient optimality...
متن کاملNonconvex Sparse Logistic Regression with Weakly Convex Regularization
In this work we propose to fit a sparse logistic regression model by a weakly convex regularized nonconvex optimization problem. The idea is based on the finding that a weakly convex function as an approximation of the `0 pseudo norm is able to better induce sparsity than the commonly used `1 norm. For a class of weakly convex sparsity inducing functions, we prove the nonconvexity of the corres...
متن کاملOn Sequential Optimality Conditions without Constraint Qualifications for Nonlinear Programming with Nonsmooth Convex Objective Functions
Sequential optimality conditions provide adequate theoretical tools to justify stopping criteria for nonlinear programming solvers. Here, nonsmooth approximate gradient projection and complementary approximate Karush-Kuhn-Tucker conditions are presented. These sequential optimality conditions are satisfied by local minimizers of optimization problems independently of the fulfillment of constrai...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- SIAM Journal on Optimization
دوره 27 شماره
صفحات -
تاریخ انتشار 2017